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Abstract 

We analyze the efficiency of markets with friction, particularly power markets. We model the market 
as a dynamic system with {dt\ t > 0) the demand process and {st; t > 0) the supply process. Using 
stochastic differential equations to model the dynamics with friction, we investigate the efficiency of the 
market under an integrated expected undiscounted cost function solving the optimal control problem. 
Then, we extend the setup to a game theoretic model where multiple suppliers and consumers interact 
continuously by setting prices in a dynamic market with friction. We investigate the equilibrium, and 
analyze the efficiency of the market under an integrated expected social cost function. We provide an 
intriguing efficiency-volatility no-free-lunch trade-off theorem. 

I. INTRODUCTION 

The first attempts of privatization and deregulation of power industry took place in the 1980s 
starting in Chile and the UK p|. After the restructuring of power markets in California in the late 
1990s, price fluctuations have resulted in an estimate of $45 billion in higher electricity costs, 
lost businesses due to long blackouts, and a weakening economic growth according to the Public 
Policy Institution of California [2]. Even though such events have been mostly considered as 
market failures [|3|, Q, it was shown in Q that the occurrence of choke-up prices (the maximum 
price a consumer is willing to pay) is intrinsic to markets with friction, and the market mechanism 
is efficient in a stylized model. Choke-up prices are observed in current market mechanisms 
regardless of being efficient, intrinsic, or market failure; this is undesirable and costly. 
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Dynamic pricing in electricity markets have interesting characteristics. Locational Marginal 
Pricing (LMP) schemes may determine very high prices for a region while a neighbour region 
might be assigned a low or even a negative price for the same amount of energy, where the 
supplier is actually willing to pay to consumers for the power they use. The constraints due 
to transmission congestion, voltage and thermal constraints, Kirchoff's Laws and start-up and 
shut-down costs are the main reasons behind excess and lack of supply which cause volatile 
prices [|6|. 

As shown in the current deregulated market mechanism is efficient with respect to the 
infinite horizon social cost. However, the definition of efficiency depends on the social cost 
function defined. Many models do not penalize volatility in their cost functions. One can define 
volatility as rapid or unexpected changes in the price process. Several models of deterministic 
and stochastic volatility have been studied in the economics literature including the famous 
deterministic Black-Scholes formula [7|, and the stochastic Heston's extension |8i], SABR [[9| 



and GARCH [10| models. We are going to adopt a much simpler definition of volatility since 
our goal is to give a specialized analysis of volatility in power markets. 

Several authors studied efficiency in power markets. Even though most studies are based 
on static frameworks, it was shown in pT[ that under ramping constraints, markets might face 
prices not necessarily equal to the marginal cost price. A dynamic game model based on duopoly 
markets is analyzed in p2| , and a dynamic competitive equilibrium for a stochastic market model 



is formulated and the role of volatility for the value of wind generation is presented in [13|. 

We model the power market through continuous dynamics and an integrated undiscounted 
cost function. The problem is presented as an optimal control problem, and the control action 
is defined as an increment process applied by the regulator. The HJB equation is solved and the 
resulting optimal control is presented. As a special case, in the class of linear quadratic cost 
functions, we analytically show that there is a trade-off between efficiency and non-volatility. In 
the second part of the paper we take a decentralized approach and define the market as a dynamic 
linear quadratic game among individual decision maker supplier and consumer agents. The agents 
are coupled through the price process. We show that this price process can be estimated, and 
the agents can calculate their best response actions based on this estimation. We show that these 
best response actions constitute an equilibrium and the trade-off theorem between efficiency and 
non- volatility is shown to hold in this dynamic game model as well. 
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In the first part of the paper we suggest a dynamic optimization framework for power markets: 
in Sec. |n[ we introduce the model we are going to use for the centralized control model. Demand 
{dt; t > 0), supply (s^; t > 0) and price (pt; t > 0) processes are defined for the social cost 
optimizer regulator agent R with the corresponding cost function. In Sec. |ni| we present the 
optimal control that leads to a volatile price process. In Sec. |IV} we define volatility and modify 
the social cost function to account for it. We solve the dynamic stochastic optimization problem 
for linear dynamics and a quadratic cost function and present the closed form solution. We show 
that there is a trade-off that can be quantified between efficiency and non- volatility, and present 
supporting simulations. In the second part of the paper we suggest a dynamic game-theoretic 
optimization framework: in Sec. |v| the consumer agents Di, 1 < i < N'^, with their dynamics 
{dl; t > 0), the supplier agents Si, 1 < i < N'^ with their dynamics (sj; t > 0), and the price 
process (pu t > 0) are defined with the corresponding cost functions for the consumers and 
suppliers. In this framework there is no regulator agent: the price process is solely determined 



in the market mechanism through the actions of the consumers and suppliers [14|. In Sec. VI 



we first show the existence of best response actions for the game model, we present the closed 



form solutions, and finally we analyze the equilibrium properties of the system. In Sec. VI-E 



we define volatility for this model, and show that the trade-off theorem can be extended to the 



multi-player game setup. We present supporting simulations in Sec. VI-F and conclude in Sec. 

m 

II. MODEL 

In this section we define the optimization problem for the social cost optimizer in power 
markets. Here we call the optimizer the "regulator" (agent R). We define the three dimensional 
state process (xt : Xt = {dt, St,PtY ; t > 0). We have (dt, t > 0), the demand process, (st; t > 0), 
the supply process, and (pt; t >0), the price process. Demand and supply dynamics are defined 
as 

ddt =f{dt, pt)dt + (Tddwt, t > 0, 

(1) 

dst =f\st,Pt)dt + asdwl, t>0, 

using deterministic continuous functions and with (wf; t > 0) and {wf; t > 0), standard 
Wiener processes. The function is allowed to be a function of d and p, values of demand 
and price, and is allowed to be a function of s and p, values of supply and price processes. 
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We employ the following assumptions on the functions and in ([T]). The first assumption 
A[T] reflects /ncrion for power markets. This assumption ensures that the instantaneous change in 
demand and supply processes with respect to a price change is constrained. This is one of the 
key properties of power dynamics: the suppliers and consumers are unable to respond to abrupt 
changes in the system instantly. The reason for the supplier's sluggishness is slow ramp up in 
power production, whereas for the consumers it is usually not handy or very complicated and 
costly to startup and shutdown a running machine or a household. The second assumption, A|2} 
reflects natural characteristics of demand and supply dynamics: demand is a decreasing function 
of the price, whereas supply is an increasing function of the price. 

Al: For constant Ci > 0, /'^(0,0) < Ci, f%0,0) < Ci and 



or 



dp 



+ 



or 



dp 

An immediate example is a linear function of the form f{x,p) = A{t)x + B{t)p with A, B of 
class Ci([0,T]). 

A2: is a strictly decreasing function of p, whereas is strictly increasing. 

This assumption ensures that an increase in price is reflected on the deterministic portion of 
decreasing demand dynamics and increasing supply dynamics. 

We also adopt the assumption below for initial values of the processes and the disturbance 
process: 

A3: {c/o,So,Po ^ I^} mutually independently distributed bounded initial conditions, and 
{w'^'.w^} are mutually independent and independent of the initial conditions. Instantaneous 
variances of the disturbance processes, aj, crf , are bounded. 



We adopt the stepwise price adjustment model [ 15 1 for the optimizer (so called regulator agent 
R), where the bounded input control process {ut, t > 0) controls the amount of the increment. 
The price process controlled by agent i?'s input is defined as 

dpt = Utdt, \ut\ < Uynax, t > 0. (2) 

The actions of R is the set {m : |m| < Umax, m € M, Umax > 0} which is simply the constrained 
price adjustment. R observes the demand and supply processes and taking into consideration 
their dynamics, cost function and the constraint on price increment, takes an action in terms of 
increasing or decreasing the power price. This action is intended to control market dynamics by 
only applying increments on the price process. 
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Following [[5|, the individual loss functions of the consumer and supplier are defined respec- 
tively: 

g'^{d, s,p) = p - s - V ■ mm{d, s) + Cbo{r), 

g'{d,s,p) = c{s) - p- s. 

Here, c{s) G : M — t- IR+, with polynomial growth with power ki or less, i.e., |c(s)| < |c|(l + 
s'^i), where denotes the family of all bounded functions which are twice differentiable. The 
function c(s) is the production cost, and is strictly convex and strictly increasing with respect 
to s. One needs to work on a realistic production cost function in order to have a reasonable 
power market model. We note that in real power markets, production cost is not a convex 
function. The startup and shutdown costs, transmission line constraints, weather fluctuations all 
affect the production cost function. However, if one neglects the startup and shutdown costs, the 
cost function resembles a convex function pi[ see Figure 1]. For our model we will assume a 
continuous convex cost. The constant v E (0, oo) is the value the consumer obtains for a unit of 
power. The blackout is denoted hy Cbo{r) E Cl : ^ IR+, with polynomial growth with power 
^2 or less, i.e., |c;,o(r)| < |cf,o|(l + r^^), is convex, zero on [0, oo) and strictly decreasing on 
(—00,0), where r denotes the reserve, r := s — d. In other words, if the total consumption in 
the system can not be met, blackout cost is paid. In the spot market, the consumer, D, pays p ■ s, 
the price of all the supply bought, to the supplier, S. Note that v is multiplied by the supplied 
portion of his demand. Blackout cost Cfeo(-) is a function of the unmet demand. Further note that 
the supplier S pays for all the cost of production, and gains unit price multiplied with all the 
units of supply bought by the consumer agent D. Finally, we employ the following integrated 
expected social cost function that is simply the sum of the consumer D and the supplier S loss 
functions integrated in time: 

J{x, m) = E / [-V ■ mm{dt, St) + c{st) + Cbo{rt)] dt. (3) 

^0 

In the section that follows, we consider the optimality of the cost function presented above with 
the dynamics ([T]), the control ([2]) and the cost function ([3]) under A[lJ A|2] and A|3j 

III. CENTRALIZED CONTROL FORMULATION 

In this section we analyze the optimal control problem in terms of the state vector x : = 
{d,s,pY. As stated before, this is a centralized control problem for the regulator agent R. In 
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principle R's objective is to regulate demand and supply processes using the price increments 
as the control tool, so that the best social outcome is achieved. In this section we show that 
the optimal control of the regulator is a "bang-bang" control, which leads to volatile prices. We 
write ([!]) and ([2]) in vector form with stochastic dynamics as 

dx = ijjdt + Gduj, t > 0, (4) 

where w is a 3 x 1 standard Wiener process. We set x := {d, s,p)~^ ,'ip := (f^(-),u)^ and. 



f{d,s,p)= I ^'^"^'^^ I , G 



(Td 

(Xs 
y y 



The loss function of (|3]) is rewritten here as g{x) = g{d, s,p) = —v-mm{d, s)+c{s)+Cbo{s—d). 
The admissible control for the regulator is specified as W = {«(■): m adapted to a{xs,s < t) 
and u{t) e U = [—Umax, Umax],t > 0}. Therefore, the regulator can at most increase or decrease 
the price with unit Umax and —Umax at each iteration. Finally, the cost associated with (|4]) and 
a control u is specified to be J{xo,u) = E,[J^ g{dt, St,pt)dt]. Further, we set the value function 



V^(0,xo) = inf J(xo,m). (5) 

The theorem that follows claims the existence and uniqueness of the optimal control to the 
problem ([5]). 

Theorem 3.1: There exists a unique u E U such that J{xo,u) = inf^g^ J(xo, m), where 
^0 = {do, So,po)^ is the initial state at time to = 0, and if tt G W is another control such 
that J{xo,u) = J(xo,u), then Pn('Ss ^ Us) > only on a set of times s E [0,T] of Lebesgue 
measure zero. 

Proof: The proof is given in Appendix |lj 

Now that we have shown the existence and uniqueness of a control, we check for approaches 
to compute the optimal solution. For a function class Q: (i) V E C([0,T] x M^), (ii) \V\ < 
Cy{l + d'^^ + s^'^) where C„, ki, k2 depend on V, (iii) V{T,x) = 0, we write the HJB Equation 

A classical solution to the HJB Equation ([6]) does not exist as GG^ is not of full rank in (|4]) 



|16|. Therefore, viscosity solutions are adopted. 
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Definition 3.1: Viscosity solution: [17 Sec. 4, Def. 5.1] 

A function v{t,x) E C([0,T] x M.^) is a viscosity subsolution to the HJB equation (|6]) if 

v\t=T < 0, and for any (f)(t, z) E C^'^([0, T] x M?), whenever v — (f) obtains a local maximum at 
(t, x) E [0, T) X R^, we have 

A function v(t,x) E C([0,T] x M'^) is called a viscosity supersolution to (|6]) if f|t=T > 0, and 
whenever v — (p takes a local minimum at (t, x) E [0, T) x IR-^, in (|7]) the inequality is changed 
to " > ". A value function v{t,x) is a viscosity solution if it is a viscosity subsolution and a 
viscosity supersolution. 

Theorem 3.2: The value function defined in ([5]) is the unique viscosity solution to the HJB 
equation (|6]) in the class Q. 



Proof: H4.1 and H4.2 of [18| are satisfied. Theorem 4.1 of [18| proves that V defined in 
Q is a viscosity solution to the HJB equation (|6]), and Theorem 4.3 of flS] proves that the 
solution is a unique solution to (|6]) in the class Q. ■ 



A. Perturbation Method 



In order to make the GG^ matrix full rank, we add {l/2)e'^{d'^V/dp'^) to @ |[T9|. For a 

function class G': (i) V E C^''^i[0, T] x R^), (ii) \V\ < Cy{l + d''' + s''^) where G^, h, fcs depend 

on V, (iii) V{T, x) = 0, we write the HJB Equation 

Qyp Qyp Qyp ( Qyp 

-r{d,p) - -^ris,p) + sup <^ — —u 



dt dd ds ueu { dp 



1 2<9Vp 1 ^d'^VP 1 ^d'^VP , , , 



where Vp{T,x) = 0. 



Lemma 3.3: pO| , Sec. 6, Theorem 4] For each k = 1,2, ... 

E|x(t)|'= < Cfc(l +E|x(s)|^), s<t<T, 

where the constant Gk depends on k, T — s, and ip. 

Lemma 3.4: (191 Lemma 6.2] Let B cR^he bounded, Vp sl solution of ^ in C^'2((0,T) x 



R^) with continuous in C^'2([0,T] x M^) and Vp{T, x) = 0. Then there exists a constant 
such that 

\VP{t, x)\ < Mb, \Vt^\ < Mb for all x E B , < t < T, 



September 20, 2011 



DRAFT 



8 



where the constant Mb depends only on B, T, the constant Ci in A[T]and c, c^o defined for c{s) 
and Cbo(r). 

Theorem 3.5: The perturbed HJB equation ([8]) has a unique classical solution in the class Q' 
for all e > 0. 

Proof: We employ an approximation approach. Let us first take < e < 1. For integer 

d >\, let /i'^(x) be such that h'^{x) = 1 for \x\ < d, h^(x) = for |x| > d + 1, and < 2. 
Let be the solution to 

-r{d,p)h''{x) - ^:—r{s,p)h''{x) + sup <^ — —u } ^'{x) 



dt dd ds ueu { dp 

where V^iJ, x) = 0. 

For fixed do > 1 and D = (0,r) x {\x\ < do), for any d > do, V^'^(t,x) satisfies ^ 



for \x\ < do- Lemma 



3.4 



ensures that V^, Vf, Vf, are uniformly bounded on D. For any 

D' = (0,T) X {\x\ < d'), < d' < do, by local estimates 

3 3 

II^''|Ia% - II^'^IIa,!) + ||V^/||a,D + ^||Kr'!l|A,D + IIKIxJU.D 

i=l *ii=l 

is uniformly bounded, where |H|a,d denotes a Sobolev type L^{D) norm, where L^{K) denotes 
the space of A-th power integrable functions on C Take A > 3, and by the Holder estimates, 
V^^ satisfies a uniform Holder condition on any compact subset of D' . Moreover, Vf, V^. ,^^,d = 
do + l,do + 2, satisfy a uniform Holder condition on such a D'. At this point we employ 
Arzela-Ascoli theorem and take a subsequence {dkg;q > 1} such that V'^'"i ,vf'"' ,Vxt'' ,Vx!',xj 
converge uniformly to V^, Vf, V]^., Vf^^^. on D', respectively, as g — > oo, where V satisfies ([8]) 
and is in the class Q' due to the growth condition on g and the compactness of U. In the next 
theorem, we use the Ito's formula to show that V' is the value function to a related stochastic 
control system, and thus it is a unique solution to ([8]) in the class Q'. 

m 

Theorem 3.6: Let a; G and e > 0. Define as solution to ([8]) and V as solution to ([6]) 
for the admissible control set U. Then V'^ ^ V uniformly on [0,T]. 

Proof: For (vt; t > 0) a standard Wiener process, we can define an alternative control action 
in the form of a stochastic differential equation dp^ = Ufdt + edvt. The resulting value function 
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can be shown to be a viscosity solution to ([8]), and this solution is unique (see Chapter 4, p7|). 
For a fixed u, we have P{lime_i.o supQ<^<2.|p*' — p| = 0} = 1. We recall from ([5]) that V^(t,a;) is 
the infimum of J(x, m) among non-anticipative controls in U. Let ki, k2 be as in the polynomial 
growth conditions for c(-) and C5o(-), Since U is compact, A[T] together with Lemma 3.3 imply 
that E|a;(t)|''i and E|x(t)|'^2 are bounded uniformly with respect to t G [0,T) and u E U. It 
follows that JP(x,u) is uniformly bounded. One can use Lebesgue's dominated convergence 
theorem to obtain \Jp{x,u) — J{x,u)\ — )■ 0, as e — 0, and — )• V as e — )• follows. By 
adopting Arzela-Ascoli Theorem similar to the methodology that was employed in the proof of 



Theorem 3.5 one can obtain V uniformly on [0, T], as e — > 0. ■ 

This gives us the following result: 

Corollary 1: For the function class the solution u* eU to the perturbed HJB Equation ([8]) 
is found as: 

d'^VP fdVP\ 
u* = argmm — — ip = -sgn — — u^ax, (10) 
u&u ox \ op J 

where u was previously defined as dpt = Utdt, t > 0, \ut\ < Umax- 

When we look at the the perturbed HJB Equation ([8]), the bound \V\ < C{1 + d''^ + s'"') is 
a direct estimate, the value function is differentiable everywhere in the function class Q', and 
due to the constraint defined on the control action, the optimal control is represented as a bang- 
bang control. Hence, the optimal control is found as a single switch. At the boundary we have 
V{T, x) = 0. Therefore, one can numerically solve ([8]). 

In Theorem 3.1 we showed the existence of an optimal control to the problem ([5]). Due to 
the problematic nature of stochastic differential equations, we have seen that the solution of 
an optimal control in "classical sense" may not exist. This leads us to formulate a suboptimal 
approach. The convergence of the suboptimal solution to the optimal solution was shown. 

The control is shown to be a simple single switch. This has significant consequences, i.e., 
we proved that the regulator needs to increase the price increment to the possible maximum or 
decrease it to the possible minimum depending on the value obtained from ( flQ] ). Due to A[T} 
the effect of price on demand and supply is constrained. Therefore, a certain amount of time is 
needed in order to adjust the levels of demand and supply in the system. For cases where demand 
is much bigger than supply or supply is much bigger than demand, the maximal increment has 
to be applied for a long period of time. Hence, volatile prices are the optimal outcome of the 
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market with respect to the cost function ([3]). 

Note that A[l]is important both for technical reasons and for modeling reasons. In addition to 
the fact that A[l] models friction, if A[l] is removed, the polynomial growth of the value function 
also may not be satisfied. Moreover, for a hypothetical frictionless market, a single increment on 
price would adjust demand and supply levels to the desired levels instantly; thus, less volatility 
would be expected. Indeed, for a completely deterministic frictionless system, volatility would 
be zero. 

IV. EFFICIENCY- VOLATILITY TRADE-OFF 

Non- volatility and efficiency are two desirable properties of power markets. In this section we 
show that these two notions contradict each other in a market model with friction. Therefore, 
one has to trade-off non- volatility and efficiency in designing the market mechanism. 

The optimal control policy for the system ([T]) and the price process due to the nature of 



the optimal control ( fTO| ) were discussed in the previous section. Since the demand and supply 
processes are defined by stochastic differential equations, they fluctuate on their trajectories and 
the regulator modifies the price process for the optimal outcome. The highest cost is paid when 
the difference between demand and supply is the highest. 

In this section we prove that no efficient regulation strategy can exist that maintains a smooth 
price process when supply and demand are defined by mean-reverting stochastic differential 
equations. 

We form a function that penalizes the control action u. Recall the loss function defined in ([3]). 
We adopt the stepwise price adjustment model defined in ([2]), where the input control process 
{uf, t >0) controls the amount of the increment. The cost associated with the system is defined 

J{xo,u)=E [g{dt,St,Pt) + ruf]dt, (11) 

where we add ru^ to the term ([3]) and r > is the volatility coefficient. We will prove that if 
the volatility coefficient decreases, the expected cost decreases. In other words, if high volatility 
is not allowed, the social cost defined in (|3]) increases. 

We define efficiency as the quantity obtained when the expected cost is multiplied by -1 
taken out the control action penalizing part: — E j^[g{dt,St,Pt)\dt- Volatility on the other hand 
is defined by the price fluctuation measured by E ufdt. 
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We require one more assumption here: 

A4: The supply process (st; t > 0) and the demand process {dt; t > 0) are linear mean- 
reverting processes that have bounded variances and admit stationary probability distributions in 
case of time invariant means. 

As a special case, we study a linear quadratic cost function of the form 

J(xo, u) = E [ {xjQxt + 2x1 D + rul)dt, (12) 
Jo 

where x := {d, and Q > 0, r > and D are constant values. Employing A|4| we have 

the dynamics 

dxt = ip{xt, Ut)dt + Gdwt, t > 0, 

dxt = {Axt + But + h) dt + Gdwt, t > 0, (13) 
where a; is a 3 x 1 standard Wiener process, x(0) = Xq, and A,B,G are in the form of 



* * ^ 



A 



* * 


where denotes a bounded constant. 



v 



B 



J 







a 





























(14) 



A. Existence and Uniqueness of the Optimal Control 

From now on, we will work on ( [T2] ) and ( [T3] ). We take the admissible control set VI2 = {u ■ 
u adapted to a{xs,s < t) and ufdt < 00}. The minimum cost-to-go from any initial state 
(x) and any initial time (t) is described by the value function which is defined by V{t,x) = 
iniu^U2 J{x,u). The optimal control problem is well defined with the Hamilton- Jacobi-Bellman 
(HJB) Equation 

- x^Qx - 2x^D = 0, 



dV ( dV^ , 

— + sup <^ -— ip - ru' 
ot U&A2 I ox 



(15) 



where V{T,x) = 0. 

As discussed earlier in Sec. |in| due to the lack of uniform parabolicity, standard solutions 
may be hard to obtain. Viscosity solutions are adopted in these circumstances. Therefore we add 
the term (l/2)e'^{d'^V/dp'^) to ( [T5| ) and obtain uniform parabolicity. Equation ( [T5| ) then becomes 



ot ueU2 



dVP 
dx 



ip — ru^ 



-x^Qx-2x^D = 0, (16) 



-a 



dd^ 
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where Vp{T,x) = 0. 

Equation ([16]) has a unique solution as stated in the following theorem. 

Theorem 4.1: Equation ([161) has a unique classical solution for the admissible control set VI2 



for all e > 0. 



Proof: The proof is very similar to the Proof of Theorem 3.5, therefore omitted 



In the theorem below, we prove that the solution to the perturbed value function (16) converges 



uniformly to the value function obtained from the HJB Equation ( [15] ). 

Theorem 4.2: Let a; G M'^ and e > 0. Define as solution to ([T6]) and V as solution to ([TS]) 



for the admissible control set W2. Then V uniformly on [0,T]. 



The proof is very similar to the proof of Theorem 3.6, therefore omitted. 
B. Closed Form Solution 



Standard arguments [21, Section 2.3] show that J{x,u) is quadratic in x. Furthermore, at any 
point a; G and t E [0,T] the minimum cost-to-go is quadratic in x. Consequently, one can 
model V of the form V{t, x) = x^ K{t)x + 2x^ S{t) + q{t) that satisfies the boundary condition 
y(T, x) = 0, Vx G M^. Substituting V in ( [T5] ) and applying first order optimization gives 

u\t) = -r-^B^[K{t)x{t) + S{t)]. (17) 

Solving the closed loop expression we get the ODEs: 

k + KA + A^K - KBr-^B^K + Q = 0, (18) 
S + {A-Br-^B^KyS + Kh + D = 0, (19) 
q + 2S^h - S^Br-^B^S + Tt{KGG^) = 0, (20) 

with boundary conditions K(T) = 0, S{T) = and q(T) = 0. The linear quadratic optimal 
control problem admits a unique optimum feedback controller given by ( ]T7] ) which obtains the 
minimum value of the cost function J{xq,u*) = xlK{{i)xQ + 2x]^S{Q) + g(0). 

C. Efficiency-Volatility Trade-off 

We would like to look at the relation between r, the volatility coefficient, and the state 
penalizing part of the cost function obtained when the volatility term is removed from the 
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cost function. We define tlie state penalizing cost as 

Jlpi^o.u*) ^ e[ [xjQxt + 2xjD]dt, (21) 
Jo 

which is denoted as efficiency when multiplied by —1. 



Theorem 4.3: Suppose A[I]-A|4] hold. For all x G M'^, the state penalizing cost portion ( [2T] ) of 
the cost function ( [121 ) using optimal control u* is an increasing function of r. 
Proof: The proof is presented in Appendix |llj 

Increasing the volatility coefficient increases social cost, therefore decreases efficiency, while 
decreasing the coefficient decreases the cost, hence increases efficiency. On the other hand 
increasing the volatility coefficient decreases volatility, whereas decreasing volatility coefficient 
increases volatility. Therefore, there is a trade-off between social efficiency and non-volatility. 

D. Simulations 

1 ) Analytical Supportive Simulation: Here we simulate a power market. We use Euler-Maruyama 
Method [22] for discretization of the stochastic differential equations. The dynamics equa- 
tions are 4+i = 4 - P (4 - (/? - Pk)) At + awfy/At, Sk+i = Sk - p{sk~ {pk - 7)) At + 
awly/At, pk+i = Pk + UkAt, where p = 0.05, At = 0.05, /3 = 75, 7 = 25, cr = 2, t final = 100, 
with the initial conditions Xq = {(Iq, so,po)~^ = (25,25,50)^. We use mean-reverting processes 
with time varying means. The power market we simulate consists of a demand process with 
mean (75 — p) MWh, and a supply process with mean (p — 25) MWh. Therefore for a price of 
$50 per MWh, the supplier is expected to produce 25 MW of power, whereas the demand in 
the system is also expected to be 25 MW. In accordance with A|2} the demand is an decreasing 



function of price, whereas supply is increasing. We calculate dJgp/dr using Theorem 4.3 using a 
range of values of r and present the result in Fig. [Tj and as expected, it is always positive. Also, 
as expected it is a convex function; the value is very high for small values of r and converges to 
as r increases. Increasing r, the volatility coefficient, corresponds to decreasing volatility which 
ends up with a cost increase as dJgp/dr > for all r > 0. In Fig. [2] we present the trade-off 
between the efficiency and the non- volatility. The numbers are normalized, and one can see that 
in a market with higher volatility the efficiency is higher. Here, on the x axis corresponds to 
the situation where r is very large and 1 corresponds to the situation where r = 0. On the y axis, 
the corresponding values are normalized, so that is the lowest and 1 is the highest efficiency 
that can be obtained. 
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r Volatility 



Fig. 1. dJsp/dr Fig. 2. Trade-off 

2) Numerical Simulation: Here we present a couple of simulations showing the dynamics 
when r = 0.01 and r = 1000. The high volatility in Fig. [3] compared to the low volatility in 
Fig. |4] can be observed. One can also notice the effect of volatility on stability. 

Also in Fig. |4] the optimal actions of the regulator agent can be observed at 4 points on the 
trajectory. At PI, the demand goes up due to stochasticity and the regulator acts with full force 
to increase the price, so that stability can be obtained. At P2, price gets high, and the demand 
is taken under control; gradually the regulator decreases the price. Between 60 seconds and 80 
seconds, we see that supply follows a higher level than the demand. The regulator acts to take 
the price down to a local minimum at P3. Then, until P4 the regulator gradually increases the 
price until it comes to a local maximum at P4. 

Now we present two more simulations with r = 1. The effect of the initial state on the 
trajectory is observed here. In Fig. [5] initially, demand is higher than the supply, whereas in Fig. 
|6] demand is lower than the supply. As expected, the price process becomes very volatile in early 
stages to stabilise the market. 

Finally, we present an experimental result showing the relation between r and the average 
absolute difference between supply and demand dynamics. Recall that high costs are paid when 
this difference is high, and as seen in Fig. |7| as r increases the average absolute difference 
increases. The x axis is drawn on a logarithmic scale in order to capture the graph on lower 
values of r. 
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Fig. 3. Dynamics when r = 0.01 



Fig. 4. Dynamics when r = 1000 
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Fig. 5. Dynamics when initial supply is higher than demand Fig. 6. Dynamics when initial supply is lower than demand 



V. DECENTRALIZED CONTROL FORMULATION 

We define a continuous dynamic game for N'^ consumers and suppliers. The agents 
continuously submit their bids as price-quantity graphs, and the system announces the resulting 
price. Agents buy or sell corresponding shares of supplies according to their bids. One important 
notion is that future demand and supply processes are dependent on the price process, which is 
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Fig. 7. Average absolute difference between demand and suply 

determined instantly by the agents' price-quantity graphs shaped by their actions. 

We have the set of agents N = {Di, Dj^d, Si, Sn^}- We define the family of three 
dimensional state processes {{dlst^pt^y-, t > 0, 1 < i < N'^} for the consumers and two 
dimensional state processes {(sj,p^')^; t > 0, 1 < i < N^} for the suppliers. The initial 
conditions {d^, Sg% 

^o^Po' A — j — mutually independently distributed 

bounded random variables which are independent of the standard Wiener processes {wf', 1 < 
i < N'^, Wf\ 1 < j < N"^; t > 0}. The process is the demand dynamics for agent Di, 
the process sf is the supply it receives, and the process pf is the parameter it applies to its 
pre-announced price-quantity graph function (f)'^^{pt;pf'). For the supplier side s] is the current 
supply and is the parameter for the price-quantity graph 4>^^{pt',Pt')- Here {0^^% 1 < ? < iV'^} 
and {0*% 1 < i < A^*} are the price-quantity graphs that the consumers and the suppliers submit 
to the market clearing price functional /"*(■) G Cb for the instant price pt determination. The 
dynamics for the consumers and the suppliers for t > are given as 

dd^^ =f^{dl,ptA''^{pt;pt'))dt + addwt\ 1 < ^ < iV^, 

dpf^ =uf'dt, l<i<N'^, 

ds\ =nsl,pt,(P'^{pt;fi))dt + asdwt% 1 < 2 < iV^ (22) 
dpp =ut'dt, l<i<N% 
Pt =/'"({</'*(■, 1 < ^ < N''- 0^^(-, ■), 1 < J < N^). 
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The actions of the agents {u'^\ 1 < « < N"^; u'^^ , 1 < j < N'^} control the size of the 
increments for {p'^\ 1 < « < A^'^; p^^ , ^ < j < N'^}. The functional f^', 1 < i < N'^, is allowed 
to be a function of d\ p and •), values of the demand of the consumer agent Di, the price 
and its price-quantity graph; and I < i < N^, is allowed to be a functional of s\p, •), 
values of the supply of the supplier 5*^, the price and its price-quantity graph. 

Following |j5|, the individual loss function of a consumer and a supplier are defined respec- 
tively: 

g'i-) =Pt -sf-v- min(rfj, ) + cUsf - 4), 



(23) 



g\-)=c{s\)-Pfs\. 



Finally, the cost functions associated with each consumer, each supplier and corresponding 
control actions u'^\ 1 <i < N'^, and u^' , I < j < A^*, are specified to be 



4%Po, W') =E [ [pt-st-v min(rfj, sf) + cUst' - di)]dt, 1 < t < N' 
Jo 

Jsisl,po,u'') =E [ [cis])-pt ■ sl]dt, l<t<N'. 
Jo 



We employ A|3]for initial values and the disturbance processes, and A[l]on the functions /'^'(■) 
and Moreover, 

A5: /*(■), I < i < N'^, is a strictly decreasing function of p, whereas 1 < ^ < N^, is 

strictly increasing. The price-quantity graphs for the consumers are decreasing functions of pt 
in the form of (p'^'ipupt) — f'^'^^ipt ) ~ Pt, whereas the price-quantity graphs are increasing in 
the form of (p""' {pf, pt") = /"^"'(KO + Pt, for the suppliers. Functions and f'^^'ipl') are 

Lipschitz continuous on M with Lipschitz constants Lip{f'^'^''), I < i < N'^, and Lip{f^°"), 1 < 
i < N'^. Consequently, for some 7 > 0, r/ > 0, the market clearing price function /'"(■) G C;, : 

M ^ M is a linear function in the form of /™ = (7/(A^^+A^'))-(E£ /'^'' (O + ES Z"^'' (■)+^)- 
This assumption limits the model to a price process parameterized by 7 > and r] > obtained 

by price-quantity graph functions submitted by the consumer and supplier agents: 1 < 

i < N'^, (p^'i-), I < i < N'^, that are linear functions of pt,t > 0. 

A|4]is employed: the demand processes {{d], t > 0); 1 < i < N'^} and the supply processes 

{(si, t >0); I < i < N"^} are linear mean-reverting processes that have bounded variances. 
As a special case, we consider linear quadratic functions below. The choice of quadratic terms 

can be explained by the convexity of the production cost and the blackout cost functions. The 
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dt, l<i<N'^ 
dt, l<i< N'. 



(25) 



rest of the cost functions can be arranged in a way that fits the linear parameters of the quadratic 
cost functions defined below. We use a penalty function for the control actions and define the 
volatility coefficient r. Increasing the volatility coefficient penalizes each agent's attempt to 
change its price-quantity functional; therefore increasing r is equivalent to penalizing volatility 
in the market when the system of agents is taken as a mass. The nonlinear curve-fitting problem 
is solved in the least-squares sense given the input data and the observed output data, and we 
get the following cost functions: 

Jo L 
Jo 

where x'^* := {d\ s'^^p'^^Y , x'^* := (s^p*^)^, Q"*'* > 0, r > are constant values, and Df^ is 
a continuous vector valued function of {x'l\ 1 < j < N*^,] ^ i; x^^l < j < N^} and Dp is 
a continuous vector valued function of {xf-', 1 < j < A^*^; x^^l < j < A^*, j ^ i}. The cost 
functions are coupled: the price functional (dependent on all agents' actions) enters into the cost 
function parameters. Employing A|4[ the equation system ( |22l ) can be written in the form of 

dxf =%l){xi\uf)dt + C^dwf, t > 0, 

rfxf' = (A^^xf^ + B\i^ + hf) dt + C^dwt^ t > 0, 

(26) 

dxp =ip{xp , up)dt + G'dwp , t > 0, 

dxp = {A'^xp + B'up + hp) dt + G'dwp, t > 0, 

where 1 < i < A^'^, wp, l<i<N'] t > 0} are standard Wiener processes with suitable 

dimensions and x'^*(0) = Xq, x'*^(0) = Xq. The function h'l^ is of the form h'l\p'l\l < j < 
N'^^j ^ i- 1 < j < A^*) and function hp is of the form hf {pp , 1 < j < Npj ^ i; pf ,1 < 
j < N'^). 

The coefficients B^''^] E & E IR"^"^"^), will be called the dynamics parameters. 

The variability of dynamics parameters from agent to agent is used to model a heterogeneous 
population of agents. Note that the dynamics are coupled among agents only through the price 
functional. The price functional enters into /i'*' and h^^\ and is a function on all agents' price- 
quantity graph functions. The following assumption is employed: 

A6: The set of dynamics parameters, ©, is a compact set in the form of © C 
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VI. EQUILIBRIUM ANALYSIS 
In this section we first define tlie value functions for the consumers and the suppliers with the 



dynamics ( [26| ) and the cost functions ([25]). We then show the existence of suboptimal solutions 
to the HJB equations using the perturbation method. Secondly, we present the closed form 
solutions for the HJB equations and the statistical dependence among the agents through the 
price process. This dependence leads to an implementation issue which is overcome by a policy 
iteration style method applied by each agent to calculate the best response action. Finally, we 
show the existence of a unique subgame perfect equilibrium of the dynamic game under a fixed 
point argument in a system of agents where each agent applies the policy iteration method. 

A. Existence and Uniqueness of the Best Response Actions 



From now on, we consider ( [25] ) and ( 126] ). We define the admissible control set, U^, of each 
consumer and supplier as the set of all feedback controls adapted to Ft, the cr-field generated 
by the agents' trajectories and the price process ,Pr;0 <'r<t, l<i< N'^', 1 < 

j < A^'^}. The minimum cost-to-go from any agent's initial state is described by the value 
functions which are defined by V"''(0,Xo") = inf^g^s ^^(4% ""'O, I < i < N'^] V''{f),xl') = 
infugj^g Js{xq ,u'^^), 1 < i < N'^ . Whenever the treatment is the same for both consumers and 
suppliers' value functions we will drop the superscripts. The value function solves the 

Hamilton- J acobi-Bellman (HJB) Equation: 

~ ^ + 'S? {"^^"^ ~ '"^''^'} ~ l^' (S^^^^) " "'^^"^ ~ " °' ^^^^ 

where V{T,x) = 0. 

As discussed before in Section [Hi] due to the lack of uniform parabolicity p7| , classical 
solutions may be hard to obtain. Viscosity solutions are typically adopted in these circumstances. 
In order to approximate the solution we add the term {l/2)e'^{d'^V/dp'^) to ([27]) and obtain 
uniform parabolicity. We obtain the perturbed value functions Vp' and Vp': 



dt u^us] dxd^ ^ ^ M 2 d<fi 



1 rP'V^^ 1 rl'^A/'^^ 
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p I 



Qys, T 



1 2^2^ 



1 

2^ 



x'^^Q'x'^ - 2x''^Dp = 0, 



(29) 



where K^^(T,a;) = 0, and V''(T,x) = 0. 



Eqs. ([28]) and (29) have unique solutions as stated in the following theorem. 



Theorem 6.1: For all e > 0, Equations ( |28] ) and ( |29| ) have unique solutions for the admissible 
control set U^. 



Proof: The proof is very similar to the proof of Theorem 3.5[ therefore omitted. 



In the theorem below, we prove that the solutions to the perturbed value functions and Vp 



(28), (29) converge uniformly to the value function V obtained from the HJB Equation (27). 



Theorem 6.2: For x G or x G suitably and e > 0, if we define Vp as the solution to (|28 



or ( [291 ) ^iid V as the solution to ( [27[ ) for the admissible control set Us, then Vp ^ V uniformly 

on [0,T] as e -> 0. 



Proof: The proof is very similar to the proof presented for Theorem 3.6 therefore omitted- 



B. Closed Form Solution 



Standard arguments [21 Section 2.3] show that J{x,u) is quadratic in x. Furthermore, at 



any point x G or suitably x G and t G [0,T], the minimum cost-to-go is quadratic in x. 
Consequently, V is of the form V{0, x) = x^-ft'(0)x + 2x^5'(0) + g(0), that satisfies the boundary 

condition V'^'{T,x'^^) = 0, Vx* G and V''{T,x'^) = 0, Vx"» G M^. 



Following the same steps in Sec. IV-B we obtain 

u* (t) = -r-^B^ [K{t)x{t) + S{t)] . (30) 

K and S in ( [30] ) are iterated backwards in time, and depend on other agents' actions on 
< t < T. This implies that at time t > 0, an agent can not calculate its best response simply 
through its own trajectory and the control action history of all the agents on < s < t. The 
agents are coupled through the price process, and the full trajectory of the price process needs 
to be calculated in order to obtain the best response action. The analysis of the best response of 
each agent and the corresponding equilibrium is presented in the next section. 

C. Subgame Perfect Equilibrium 

In this section we analyze the equilibrium properties. At each time iteration and at each point 



in the state space each agent solves the ODEs ( [T8[ ), ( [T9[ ) and ( |20] ) for all the consumers and the 
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suppliers in the system, calculates their best response actions pO] ), and simultaneously solves 
these equations for each agent to obtain the unique fixed point in the action space. As stated 
before, the admissible control set is W3, the set of all feedback controls adapted to Ft, the a- 
field generated by the agents' trajectories and the price process {x^%Xt ,Pt;0 < r < t, 1 < 
i < N'^^ 1 < j < N'^}. Each individual agent knows the dynamics and cost function parameters 
of all agents in the system. Therefore, at a certain time t > 0, and for a given point in the 
state space, each agent can solve the ODEs ( [T8| ), ([19]) and ( [20] ) that depend on all agents, and 
get the unique fixed point for the action profile. The state processes of all agents in the system 
are statistically dependent due to the price functional; however, at each t > 0, given that all 
dynamics and state information is known, the best response calculations for all agents can be 
independently calculated by each agent in the system. We show that the system of equations 
regarding the best response actions of all agents in the system has a unique solution. We also 
present the policy iteration procedure that leads to the unique solution of the system of equations 



when applied by all agents in the system. Due to the stochasticity of the system dynamics ( [261 ), 
this procedure is repeated by each agent until the fixed point is obtained at each time iteration. 
The compactness of the parameter set and the boundedness of the price functional ensure the 
existence of the fixed point. 

A7: [A'^^''^'' , B] G © is controllable, [Q^^"^, A^^''^''] is observable, and A^ is a Hurwitz matrix. 
For all e e @, all the eigenvalues of A^{9) = A{9) - B{9)r-^B^ {9)K{9) have negative real 
part; A^, is continuous over ©; there exists k > 0, p > such that ||e^**^^)*|| < kc^^*, Vt > 0. 
The closed form solution is written for S and K as 

Sit) = e-^'^'-^^KiT)h{p{T))dT + e-^^ (*"^)D(p(r))rfr ^ Tip^, (31) 



where T2 is the solution to the Riccati equation (18). 



Since the solution S{t] 9), 9 ^ G, to the ordinary differential equation ( [T9[ ), and the solution 



K{t; 9), 9 E &, to the Riccati equation ( [18] ) parameterized by G © are smooth functions of 9 
(see f23]), S{t; 9) and K{t] 9) satisfy the following lemma. 

Lemma 6.3: Under A[3) A|4] A]5) A]6] and A]?] we have TiP G Cb[0, 00) and T2P G Cb[0, 00) 
for any p(-) G C;,[0, 00). 
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Therefore 71 and T2 are bounded continuous maps. With the best response actions applied, 
assuming Xq' = 0, 1 < z < A^'^; = 0, 1 < i < N^, without loss of generality, in the closed 
loop we have 

Jo Jo 

ft 

Ex 



(32) 

l'{t) = - [ e'^'^'-^^B'r-'B'^S'^{p{T))dT+ [ e^'^'-^\'\p{T))dT ^Tm- 
Jo Jo 



Lemma 6.4: Under A|3| A|4| A|5| Ag and A|7| 7^j9 G €^[0, 00) and TaP G C^O, 00) for any 
040,00). 

Proof: Due to A|7} A* is a Hurwitz matrix. Moreover we have shown in Lemma 6.3 that 



S{t] 9) is a bounded value in Cft[0, 00). Therefore TiP G Cf,[0, 00) and TaP G Cb[0, 00) follows. 

■ 

Now, we write the price function /'"(■) for t > 0: 

Vt = r({0*(Pt;p'O, 1 < ^ < iV'^; r^{j>uPV)A< % < N'}) = T,pt. (33) 

The following lemma establishes that Ts defined above is a map from Cb[0, 00) to itself. 
Lemma 6.5: Under A|3} A|4| A|5} A|6]and A|7} we have T5P G Cb[0, 00) for any p G CifO, 00). 
In the following theorem we show that T5 has a fixed point. Following that we deduce that 



in a system of consumers and suppliers with dynamics ( [26| ) and cost functions ([25]) the system 
has a unique equilibrium, and this equilibrium is indeed the unique subgame perfect equilibrium 
within Markovian strategies. At this point we introduce the following technical assumption: 



< 1, 



where 7 > is specified in A|5| k > is specified in A|7| \\K'^it)\\ < M^d, \\K'{t)\\ < Mk^ 
for all < t < T, M^d = maxi<j<^d ||, M^s = maxi<j<7v^^ H/i'^' ||, Mj^d = maxi<j<^d ||, 
Md" = maxi<i<Ar.||D'*'||, M^.p(^^dj = maxi<j<^d||Lzp(/'^''")||, M^ip^^j^^^ = maxi<i<NA\Lip{f^''")\\, 
Lip^f^"^) and Lip^f^") are the Lipschitz constants respectively for f'^'''{-) and f^^^i-) specified 
in Am 

This technical assumption ensures the uniqueness of the price process. Note that there is a 
trade-off between 7 and r in this inequality. A small r means cheaper control actions for the 
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agents; therefore, the price process is more likely to be volatile and eventually intractable. This 
assumption ensures tractability even for small values of r. Note that numerical results show that 
this assumption can be satisfied. 

Theorem 6.6: Under Ag A|4| Ag Ag A|7| and the map % : Cb[0, oo) ^ Cb[0, oo) has a 
unique fixed point which is uniformly Lipschitz continuous on [0, oo). 



Proof: The proof is given in Appendix III 



The main result of this section immediately follows Theorem 6.6 

Corollary 2: Under A|3} A|4| A|5} A|6} A|7] and A|8} the expected value of the equation system 
52]) admits a unique bounded solution. 



D. Policy Iteration 

We now describe the iterative policy of an agent from its policy space. At time t G [0, T], for 
a fixed iteration number A; > and r G [t,T], suppose that there is a priori Prik) G Cfc[0, oo). 
Then the best response action ([30]) of each agent is in the form of m*(A; + 1) = —r^^B^[Kr{k + 
l)xT-{h:) + S-rik + 1)]. Taking the same steps in the previous section we get the recursion for 
Prik) as ¥.\pr{k + 1)] = T^Prik). The procedure can be applied for all t < r < T, and the 
recursion converges to a unique p*{t), t < r < T, and once the price trajectory is obtained, 



each agent is able to calculate its best response action (30). The existence and uniqueness of 



P*{t), t < T < T, are shown in Theorem 6.6 by use of a fixed point argument. This procedure 



is independently performed by each agent in the system at each time iteration. The following 



Proposition may be proved by Theorem 6.6 



Proposition 6.7: Under A]3] A]4] A]5} A]6] A]7] and A]8] lim^^oo J5^(A;) = P*{r) for any p* G 
[0, oo) where p* is the solution to ( [22] ). 

Before we present the subgame perfect equilibrium theorem, we employ the assumption below: 

A9: Agents can only use Markovian strategies, i.e., we rule out many non-myopic subgame 
perfect equilibria mostly based on future punishments such as grim-trigger strategies in repeated 
and dynamic games [24]. 

A Markovian strategy 7^ of a player is defined to be a strategy where for each t, 'ji(t, x) depends 
on J't, the cr-field generated by the agents' trajectories and the price process {xf , Xr ,Pt',0 < 
T <t, 1 <i < N'^, 1 < J < N'} only through t, {x^^ x'^pf, I <i < N'^, 1 < j < N'}. 
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Let us define T to be the class of mappings 7 : [0, T] x — )■ M with the property that 
u{t) = 7(t,x) is adapted to J^t, the cr-field generated by the agents' trajectories and the price 
process {xf^Xr' ,pr;0 <T<t,l<i< N'^, I < j < N'^}. A subgame perfect equilibrium of 
the dynamic game with the set of agents N with dynamics ( [22] ), and with the cost functions 
( |24l ) is a strategy profile 7* G F such that for any history h, the strategy profile 7*|/i is a Nash 
equilibrium of the subgame based on the history h. 

Under A|9] the iterative update of agents' policies results in the system's unique subgame 
perfect equilibrium. 

Corollary 3: Under A[l| A|3]-A|9} for agents with dynamics (|22]), the action profile obtained 
when all agents apply ( [SO] ) at any t > 0, with the iterative procedure described above is the 
unique subgame perfect equilibrium of the game. 

The system has a unique equilibrium within Markovian strategies, and the action profile 
corresponding to the equilibrium can be obtained by an iterative algorithm applied by each agent. 
The equilibrium is shown by use of a fixed point argument, and the procedure is explained by 
a policy iteration methodology. At each time iteration each agent looks at the future, estimates 
the price trajectory, and calculates the best response action. This procedure leads to the unique 
best response profile, and the equilibrium is obtained. 



E. Efficiency-Volatility Trade-off 

We would like to look at the relation between the social cost function and the action penalizing 
parameter, volatility coefficient, r. The social cost function is defined as J = J{{d\ s"^', s^,p, u^\u^^, 
l<t<N^,l<j< N'}) = E£ Jdid\s''%p,u''^) + ■Jsis\p,u'^), 

J = V E / (xfQ'^xi^ + 2xf ^Df + riui^f) c/t + V E / (xl^'^Q'xl' + 2xfD'i + r«0') 

i=i ^ ' i=i -^0 ^ ^ 

(34) 

We define efficiency as the quantity obtained when the social cost is multiplied by —1. 
Volatility on the other hand is defined by the price fluctuation measured by 'Yl!i=\ ^ /o^(^t')^'^^ + 

Then we remove rn^ from the cost function and define the state penalizing social cost under 
the best response actions pO] ) applied both by the consumers and the suppliers. Note that the 
social cost is merely a summation of the same type of cost functions; therefore the analysis of 
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a single summand term can be easily carried onto the whole cost function. The state penalizing 
social cost function is written: 

j;^ = / fxf^Q'^xt^ + 2xf^DfA dt + Y,^ (xt'^Q'xt' +2xt'^Dp) dt. (35) 

i=l ^0 ^ ' i=i -^0 ^ ' 



Theorem. 6.8: For all x G M , or x G M' suitably, the state penalizing cost portion ([35]) of the 
cost function ([M]) using the best response solution u* is an increasing function of r. 



Proof: The proof is very similar to the proof of Theorem 4.3 , therefore not provided. 

Corollary 4: Suppose A[lJ A|3]-A|9] hold. Let the price adjustment of the consumers and the 
suppliers be penalized with a factor of the volatility coefficient. Increasing the volatility coefficient 
increases the integrated social cost, while decreasing the coefficient decreases the cost. 

Therefore, there is an inherent trade-off between social efficiency and non-volatility. 

F. Simulations 



Here we simulate a power market. We use Euler-Maruyama Method [22| for discretization 
of the stochastic differential equations. The dynamics equations for 1 < i < A^'^, 1 < j < 
are = 4 - p (4 - (/3 -p,)) At + aw^^t, p^+i = Pk + «fe At, = ^ - 

A^"*), where p = 0.05, At = 0.05, /3 = 75, a = 2, t final = 500, with the initial conditions 
{dtvi'V = (25,50)^, (4',Po = (25,50)^, po = 50. 

We present a couple of figures showing the dynamics when r = 0.005 and r = 100. The 
high volatility of the price in Fig. [8] compared to the low volatility of the price in Fig. |9] can 
be observed. One can also notice the effect of volatility on stability. In Fig. [8} the aggregate 
demand and aggregate supply dynamics follow a much closer path, whereas in Fig. |9} the gap 
between these two processes turns out to be fluctuating. As the highest costs are paid when the 
absolute difference between the aggregate demand and supply is the highest, the social cost paid 
in Fig. |9]is larger than in Fig. [8] 

VII. CONCLUSION 

The no friction assumption of the traditional market mechanisms is a significant factor in 
terms of analyzing dynamic continuous time markets. Friction adds complexity to the analysis 
of the market, and volatility is an inevitable result of friction as our stylized model shows. 
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The market model has been defined as a social cost optimization problem for the regulator, 
and it has been shown that a simple linear closed form control exists. As a special case we 
studied quadratic cost functions and have shown that there is a trade-off between social efficiency 
of the market and non-volatility. High efficiency requires a volatile market, whereas one has to 
compromise efficiency in order to get stability in prices. Then we have extended the optimization 
model to a dynamic game theoretic framework and have shown the same efficiency-volatility 
trade-off for this model with closed form best response actions. The complexity increases as the 
number of agents in the system increases; therefore, this leads to an implementation problem. 
In future work we will apply mean field methods and study the limit behaviour of a large 
population model of suppliers and consumers to precompute the price process, therefore decrease 
the computational complexity. 

Also, in the current model, it is assumed that each agent in the system knows all the dynamical 
and cost function parameters of other agents in the system. Adaptation and learning methods 
will be applied in future work for a system of agents that start with partial statistical information 
on other agents' dynamical and cost function parameters. The adapting (or learning) agents will 
be allowed to only observe a random partial fraction of the population of agents' trajectories 
and the price process. Convergence rates will be studied and properties of the equilibrium will 
be investigated in this setup. The behaviour of the efficiency-volatility theorem will also be 
analyzed. 
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Appendix I 
PROOF OF THEOREM O 

Proof: 

The existence of the solution follows a standard argument [[T7| Chapter 2, Theorem 5.1]. The 
continuity of V{x) is based on the continuous dependence of the cost function on the initial 
state values. 

The plan of the proof is as follows: first, to show the uniqueness of the control, we show that 
the cost function J is strictly convex in p. By contradiction: we show that the probability of two 
distinct control action trajectories leading to the same price process p is of Lebesgue measure 
zero. This proves uniqueness. 

We have 

J{t,Xt,u) =E, / [—vmm{dT-, Sr) + c{sr) + Cboidr — ST-)]dT, 

which can be written for At > 

J(t, xt, u) = [-V mm{dt, St) + c{st) + Cbo{dt - Sj)] At + E / (.)dr. 

J At 

The first part is independent of pt- We have 

r 

J(t, Xf, m) = (■) + E / [—vmm{dT-,Sr) + c{s-j-) + Ci)o{dr — ST-)\dT. 
J At 

For s = T — At, 

fT-At 



J(t,xuu) = (O+E / [-vvcmi{ds + r{ds,Ps)^t + dw'l,Ss + r{ss,Ps)^t + dwl) + 
Jo 

c{ss + f{ss,Ps)At + dwl) + Cbo{ds + fids,Ps)At + dwi - - f{ss,Ps)At - dwl)]ds. 

Therefore, J{t,Xt,u) = (■) + E J^'^^lgs+Atlds. For ds+At < Ss+At, as p increases, Egs+At also 
increases and tends to oo as p — )■ oo. As p decreases, Egs+At decreases and tends to g* as p ^ p*, 
where E,{ds+At = Ss+At)- For dg+At > Ss+At, as p decreases, gs+At increases and tends to oo as 
p — oo. As p increases, gs+At decreases and tends to g* as p p*, where E,{ds+At = Ss+At)- 

We have J{ds, Sg, ^{ps +Ps),u) < ^[J{ds, Ss,Ps, u) + J{ds, Ss,Ps, u)], and the inequality holds 
on A = {{s,uj),ps 7^ Ps}- Let E Ip^^p^ds > 0, i.e., A has a strictly positive measure. Then 
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the control {l/2){u + u) eU gives 



J { Xq, ^(m + m) j <^[J{xo,u) + J{xo,u)], 



rT 



inf J(xo, u). 



which is a contradiction. Therefore, E Ip^^jj^ds = 0. The trajectories of Ps are continuous 
with probability 1 by (|2]). Hence, we have ps — Ps = ^ on [0,T] with probability 1. We have 
j^iur — Ur)dT = ps — Ps, for all s e [0,T], by (|2]). Hence, with probability \, Us — Us = 
a.e. on [0,T] which is equivalent to E l^^^i^^ds = JqFq{us ^ Us)ds = 0. Consequently, 
IPq(ws 7^ Us) > only of Lebesgue measure zero on s G [0, T]. Therefore, uniqueness is proved. 



Appendix II 
PROOF OF THEOREMO 

Proof: If we were to only find dJ{x,u*) /dr, we would calculate it from 

dJ{x,u*) _ ^jdK{0) ^ ^^jdSiO) dq{0) 
dr ^ dr ^ ^ dr dr 

However, we are interested in dJsp{x,u*) /dr. 

The quadratic cost function was shown ^ to be J{x,u*) = J^;[x*^{t)Qx*{t) + 2x*^{t)D + 
r{u*{t)y]dt. We seek to compute dJsp{x, u*)/dr. However, the calculations are easier for dJsp{x, u*)/d'j, 
where 7 = r^^ We have 

dJsp{x,U*) , x-J/^\ f / 



dt 



\t=T 



fJ(x) = (a;*^(t,7)Qx*(t,7) + 2a;*^(t,7)I^) L^^, (36) 



with the initial condition Jsp{x, u*)\t=o = 0. We take the derivative of ( [361 ) with respect to 7 and 
changing the order of differentiation 

d f dJspix,u*)\ f dx*{t,'y)\^ 

dx*{t,'y)\ ^ ^ f dx*{t,'y)\^ ^ dJsp{xo,u*] 



dt \ d'j J \ d'j 



+ +2 ^ D. ^^^0. (37) 



In order to get ([37]), we need dx*{t,'y)/d'y. Using ( [T3| ) and ( [TT] ) we obtain 



dx*{t,-f) = r{x*{t,j),j,K{t,-f),S{t,-f))dt + Gduj = 

{Ax*{t,-f)-B^B^{K{t,-f)x*{t,^) + S{t,-f)) +h)dt + Gduj, a;(0) = 0. (38) 
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Taking the derivative of ([38]) with respect to 7 and changing the order of differentiation 



dt \ dj J \ dj J \ dj J d'y 

- BB^K{t,y)x*{t,y) - ByB^^^ - BB^Sit,^), ^ = 0. 

^ ^ (39) 



In order to calculate ( [391 ), we need to calculate dK(t,'y) /d'j = /^(/^(t, 7), 7) and 

rf^(t,7)M = /W^, 7), 7,^(^,7))- 

The differential equation for K{t, 7), < t < T, and S{t, 7), < t < T, were shown to be 
( fTSj ) and ( [T9| ). Using the same changing order of differention technique, we get: 

d ( dK{t,^) \ _ df^i-) dK{t,^) df^i-) dK{T) 



dt \ d'j J dK{t,'j) d'j d'j ^ d'j ^' ^^^^ 

d ( dSit,^) \ _ dfi-) dSjt,^) dfi-) dfi-) dKjt,^) dSjT) _ 
dt\ d-i ) dS{t,-f) rf7 97 dK{t,-f) d-f ' d-f ■ ^ ) 



The differential equations d40b and (41 1 are expanded as 



+ i^(t,7)57fiT (^^^^]^^ +ir(t,7)55^ir(t,7); and 

d / ^ _^-,d_S{lr^ ^ iB^B^K{t,^)y^-^ 
dt \ dj J 07 d'j 

+ (^)'575-5(t,7) + i^-(t)55-^(t,7) - '-^h. 



(42) 



(43) 



Therefore we have all the components to calculate dx*{t,'y)/d'y ( [39] ). Consequently, we are 
ready to calculate dJsp{x,u*) /d'j ([37]). One can write 



dJsp{x,u*) _ dJspjx, u*) _ dJsp{x,u*) _ dJsp{x,u*) -1 ^^^^^ _ ^-1 ^^^-^ 
dj d{f{r)) f'{-)dr dr r"^' ^ 

From ( [44] ) one can calculate dJsp{x, u*)/dr. Here we use the specific nature of matrices A and 



B given in ( [T4[ ) and show analytically that dJsp{x, u*)/dr > 0, and it is an increasing function of 
time. Here we show the derivations calculated at the steady state values of K, S, and x. However, 
the calculations also hold for any < t < T. At steady state only the terms involving the noise 
variables effect dJsp{x,u*) /d'j. Apart from that for all the other terms dJsp{x,u*) /d'j = 0. We 
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solve the stochastic differential equation ([391) ^i^d obtain 



dx*{t,'y) 
d'j 



\A-B-yB'^K(Tn))(.t-T) 



B^B'' ^^^Illl - BB^Kir, 7) ) x* (r, 7) 



-B-fB 



d'j 



BB^S{r,j) 



(45) 



dr. 



The stochastic differential equation for x* in ( |45| ) gives the solution 



a:*(r,7)= / e(^-^^^"^(^'^))(--^)(-i?75^5(s, 7) + /^) + / e(^-^^^"^(^'^))(--^)GrfW^(s). 
Jo Jo 

We now inject these two equations into ([37]), solve the stochastic differential equation, arrange 



the terms, and obtain 



^(A-B-yB^ K{r,^)y [s-t)^^^ 

Jo 



(46) 



dJsp{x,u*) 



d'j 



2 / TT{R2{t) * Q)dt. 



All the terms in ( |46l ) are positive except for {—B^B dK{T,'y)/d'y — BB K{t,'j)) . The 
matrix is an all zeros matrix except for 1 at the rightbottom entry. The multiplication 



with dK{T,'y)/d'y and /^(r, 7) give positive values due to ( [14] ) that solve ( |40] ) and ( [T8] ). Hence, 

/?i(t) < 0. 

Therefore, one obtains dJsp{x, u*)/d'y < for all 7 > 0. As dJsp{x, u*)/dr = —{dJsp{x, u*)/d'y)r~'^, 
one obtains dJsp{x,u*) /dr > for all r > 0. 



Also, one can see the role of the noise variance on the trade-off. G in (46 1 is the noise variance 



matrix; and notice that an increase in the noise variance leads to an increase in dJ* /dr 



Hence, the state penalizing cost portion (21 ) of the cost function ( 12) in the closed loop using 



the optimal control u* is an increasing function of r. Thus, Theorem 4.3 is proved. 



Appendix HI 
PROOF OF THEOREM |6Z] 
Proof: We rewrite the operator % ( [33] ) here again: 



Pt 



Pt = 7 ■ ' 



+ T] 
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Ts is a map from the Banach space C;,[0, oo) to itself. For any x,y e Cb[0, oo), injecting 
Kx^'-^Kx^' from ( [32] ) and S{-) from ( [3T] ) we have 

Il(r5a;-r5|/)(t)|| 

Then we solve the integrals and obtain 
\\{%x-%ym\\ 



g ^^Xi^!'" (^IIBlV-'(M,.||ft*|| + ||Z)*||) + 



(f:/*"(^l|B''f''-'(' 



P 

where ||e^*(^)*|| < kc"^*, Vt > due to Ag < M^^d, \\K'{t)\\ < Mk^ for all < 

t <T, Mhd = maxi<j<jvd||/i'^'||, Mh= = maxi<j<Ars ||, Mj^d = maxi<j<7vd||-D°''||, Mn^ = 
maxi<j<jvs||Z}'*'||. All the bounds given above exist due to A|6j 

Now employing Ajs] we have the bound below using the Lipschitz continuity of f'^'^' , 1 < « < 
A^'^ and /'^'\ 1 < i < N' with Lipschitz constants Lip{f'^'''), 1 < i < A^'^ and 
Lipif/"''), l<i<N': 

\\i%x-%ym\\ 
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Now we insert M^.^^j^^d^ = maxi<i<^d||Liip(/'^''")||, M^ipi^j^s^ = maxi<i<Ars||Lip(/'^"')|| and 
obtain 



\\i%x-%ym\\ 
k'^\\x - 



Then from A|8] it follows that Ti is a contraction and therefore has a unique fixed point 

p e Cfe[0,cx)). 
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